A well-defined HDX-MS experiment
This vignette describeds how to analyse time-resolved differential HDX-MS
experiments. The key elements are at least two conditions i.e. apo + antibody,
apo + small molecule or protein closed + protien open, etc. The experiment can
be replicated, though if there are sufficient time points analysed (>=3) then
occasionally signficant results can be obtained. The data provided should be
centroid-centric data. This package does not yet support analysis straight
from raw spectra. Typically this will be provided as a .csv from tools such as
dynamiX or HDExaminer.
Main elements of the package
The package relies of Bioconductor infrastructure so that it integrates with
other data types and can benefit from advantages in other fields of mass-spectrometry.
There are package specific object, classes and methods but importantly there is
reuse of classes found in quantitative proteomics data, mainly the QFeatures
object which extends the SummarisedExperiment class for mass spectrometry data.
The focus of this package is on testing and visualisation of the testing results.
Data
We will begin with a structural variant experiment in which MHP and a structural
variant were mixed in different proportions. HDX-MS was performed on these samples
and we expect to see reproducible but subtle differences. We first load the data
from the package and it is .csv format.
MBPpath <- system.file("extdata", "MBP.csv", package = "hdxstats")
We can now read in the .csv file and have a quick look at the .csv.
MBP <- read.csv(MBPpath)
head(MBP) # have a look
## hx_sample pep_start pep_end pep_sequence pep_charge d confidence score
## 1 10% 19 30 VIWINGDKGYNG 2 2.120 medium 0.8686
## 2 10% 19 30 VIWINGDKGYNG 2 2.146 medium 0.8173
## 3 10% 19 30 VIWINGDKGYNG 2 2.143 medium 0.8839
## 4 10% 19 30 VIWINGDKGYNG 2 NA <NA> NA
## 5 10% 19 30 VIWINGDKGYNG 2 NA <NA> NA
## 6 10% 19 30 VIWINGDKGYNG 2 NA <NA> NA
## hx_time time_unit replicate_cnt
## 1 30 s 1
## 2 30 s 2
## 3 30 s 3
## 4 30 s 4
## 5 30 s 5
## 6 30 s 6
length(unique(MBP$pep_sequence)) # peptide sequences
## [1] 115
Let us have a quick visualisation of some the data so that we can see some of
the features
filter(MBP, pep_sequence == unique(MBP$pep_sequence[1]), pep_charge == 2) %>%
ggplot(aes(x = hx_time, y = d, group = factor(replicate_cnt),
color = factor(hx_sample,
unique(MBP$hx_sample)[c(7,5,1,2,3,4,6)]))) +
theme_classic() + geom_point(size = 2) +
scale_color_manual(values = brewer.pal(n = 7, name = "Set2")) +
labs(color = "experiment", x = "Deuterium Exposure", y = "Deuterium incoperation")
## Warning: Removed 96 rows containing missing values (geom_point).
We can see that the units of the time dimension are in seconds and that
Deuterium incoperation has been normalized into Daltons.
Parsing to an object of class QFeatures
Working from a .csv is likely to cause issues downstream. Indeed, we run
the risk of accidently changing the data or corrupting the file in some way.
Secondly, all .csvs will be formatted slightly different and so making extensible
tools for these files will be inefficient. Furthermore, working with a generic
class used in other mass-spectrometry fields can speed up analysis and adoption
of new methods. We will work the class QFeatures from the QFeatures class
as it is a powerful and scalable way to store quantitative mass-spectrometry data.
Firstly, the data is storted in long format rather than wide format. We first
switch the data to wide format.
MBP_wide <- pivot_wider(data.frame(MBP),
values_from = d,
names_from = c("hx_time", "replicate_cnt", "hx_sample"),
id_cols = c("pep_sequence", "pep_charge"))
head(MBP_wide)
## # A tibble: 6 x 198
## pep_sequence pep_charge `30_1_10%` `30_2_10%` `30_3_10%` `30_4_10%` `30_5_10%`
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 VIWINGDKGYNG 2 2.12 2.15 2.14 NA NA
## 2 VIWINGDKGYN~ 2 2.12 2.12 2.11 NA NA
## 3 GDKGYNGLAEVG 3 0.552 0.555 0.553 NA NA
## 4 LAEVGKKFEKD~ 4 2.41 2.36 2.42 NA NA
## 5 AEVGKKFEKDT~ 4 0.458 0.425 0.573 NA NA
## 6 TVEHPDKL 3 1.43 1.44 1.44 NA NA
## # ... with 191 more variables: 30_6_10% <dbl>, 30_7_10% <dbl>, 240_1_10% <dbl>,
## # 240_2_10% <dbl>, 240_3_10% <dbl>, 240_4_10% <dbl>, 240_5_10% <dbl>,
## # 240_6_10% <dbl>, 240_7_10% <dbl>, 1800_1_10% <dbl>, 1800_2_10% <dbl>,
## # 1800_3_10% <dbl>, 1800_4_10% <dbl>, 1800_5_10% <dbl>, 1800_6_10% <dbl>,
## # 1800_7_10% <dbl>, 14400_1_10% <dbl>, 14400_2_10% <dbl>, 14400_3_10% <dbl>,
## # 14400_4_10% <dbl>, 14400_5_10% <dbl>, 14400_6_10% <dbl>, 14400_7_10% <dbl>,
## # 30_1_15% <dbl>, 30_2_15% <dbl>, 30_3_15% <dbl>, 30_4_15% <dbl>, ...
We notice that there are many columns with NAs. The follow code chunk removes
these columns.
MBP_wide <- MBP_wide[, colSums(is.na(MBP_wide)) != nrow(MBP_wide)]
We also note that the colnames are not very informative. We are going to format
in a very specific way so that later functions can automatically infer the design
from the column names. We provide in the format X(time)rep(replicate)cond(condition)
colnames(MBP_wide)[-c(1,2)]
## [1] "30_1_10%" "30_2_10%" "30_3_10%" "240_1_10%"
## [5] "240_2_10%" "240_3_10%" "1800_1_10%" "1800_2_10%"
## [9] "1800_3_10%" "14400_1_10%" "14400_2_10%" "14400_3_10%"
## [13] "30_1_15%" "30_2_15%" "30_3_15%" "240_1_15%"
## [17] "240_2_15%" "240_3_15%" "1800_1_15%" "1800_2_15%"
## [21] "1800_3_15%" "14400_1_15%" "14400_2_15%" "14400_3_15%"
## [25] "30_1_20%" "30_2_20%" "30_3_20%" "240_1_20%"
## [29] "240_2_20%" "240_3_20%" "1800_1_20%" "1800_2_20%"
## [33] "1800_3_20%" "14400_1_20%" "14400_2_20%" "14400_3_20%"
## [37] "30_1_25%" "30_2_25%" "30_3_25%" "240_1_25%"
## [41] "240_2_25%" "240_3_25%" "1800_1_25%" "1800_2_25%"
## [45] "1800_3_25%" "14400_1_25%" "14400_2_25%" "14400_3_25%"
## [49] "30_1_5%" "30_2_5%" "30_3_5%" "240_1_5%"
## [53] "240_2_5%" "240_3_5%" "1800_1_5%" "1800_2_5%"
## [57] "1800_3_5%" "14400_1_5%" "14400_2_5%" "14400_3_5%"
## [61] "30_1_W169G" "30_2_W169G" "30_3_W169G" "240_1_W169G"
## [65] "240_2_W169G" "240_3_W169G" "1800_1_W169G" "1800_2_W169G"
## [69] "1800_3_W169G" "14400_1_W169G" "14400_2_W169G" "14400_3_W169G"
## [73] "30_1_WT Null" "30_2_WT Null" "30_3_WT Null" "30_4_WT Null"
## [77] "30_5_WT Null" "30_6_WT Null" "30_7_WT Null" "240_1_WT Null"
## [81] "240_2_WT Null" "240_3_WT Null" "240_4_WT Null" "240_5_WT Null"
## [85] "240_6_WT Null" "240_7_WT Null" "1800_1_WT Null" "1800_2_WT Null"
## [89] "1800_3_WT Null" "1800_4_WT Null" "1800_5_WT Null" "1800_6_WT Null"
## [93] "1800_7_WT Null" "14400_1_WT Null" "14400_2_WT Null" "14400_3_WT Null"
## [97] "14400_4_WT Null" "14400_5_WT Null" "14400_6_WT Null" "14400_7_WT Null"
new.colnames <- gsub("0_", "0rep", paste0("X", colnames(MBP_wide)[-c(1,2)]))
new.colnames <- gsub("_", "cond", new.colnames)
# remove annoying % signs
new.colnames <- gsub("%", "", new.colnames)
# remove space (NULL could get confusing later and WT is clear)
new.colnames <- gsub(" .*", "", new.colnames)
new.colnames
## [1] "X30rep1cond10" "X30rep2cond10" "X30rep3cond10"
## [4] "X240rep1cond10" "X240rep2cond10" "X240rep3cond10"
## [7] "X1800rep1cond10" "X1800rep2cond10" "X1800rep3cond10"
## [10] "X14400rep1cond10" "X14400rep2cond10" "X14400rep3cond10"
## [13] "X30rep1cond15" "X30rep2cond15" "X30rep3cond15"
## [16] "X240rep1cond15" "X240rep2cond15" "X240rep3cond15"
## [19] "X1800rep1cond15" "X1800rep2cond15" "X1800rep3cond15"
## [22] "X14400rep1cond15" "X14400rep2cond15" "X14400rep3cond15"
## [25] "X30rep1cond20" "X30rep2cond20" "X30rep3cond20"
## [28] "X240rep1cond20" "X240rep2cond20" "X240rep3cond20"
## [31] "X1800rep1cond20" "X1800rep2cond20" "X1800rep3cond20"
## [34] "X14400rep1cond20" "X14400rep2cond20" "X14400rep3cond20"
## [37] "X30rep1cond25" "X30rep2cond25" "X30rep3cond25"
## [40] "X240rep1cond25" "X240rep2cond25" "X240rep3cond25"
## [43] "X1800rep1cond25" "X1800rep2cond25" "X1800rep3cond25"
## [46] "X14400rep1cond25" "X14400rep2cond25" "X14400rep3cond25"
## [49] "X30rep1cond5" "X30rep2cond5" "X30rep3cond5"
## [52] "X240rep1cond5" "X240rep2cond5" "X240rep3cond5"
## [55] "X1800rep1cond5" "X1800rep2cond5" "X1800rep3cond5"
## [58] "X14400rep1cond5" "X14400rep2cond5" "X14400rep3cond5"
## [61] "X30rep1condW169G" "X30rep2condW169G" "X30rep3condW169G"
## [64] "X240rep1condW169G" "X240rep2condW169G" "X240rep3condW169G"
## [67] "X1800rep1condW169G" "X1800rep2condW169G" "X1800rep3condW169G"
## [70] "X14400rep1condW169G" "X14400rep2condW169G" "X14400rep3condW169G"
## [73] "X30rep1condWT" "X30rep2condWT" "X30rep3condWT"
## [76] "X30rep4condWT" "X30rep5condWT" "X30rep6condWT"
## [79] "X30rep7condWT" "X240rep1condWT" "X240rep2condWT"
## [82] "X240rep3condWT" "X240rep4condWT" "X240rep5condWT"
## [85] "X240rep6condWT" "X240rep7condWT" "X1800rep1condWT"
## [88] "X1800rep2condWT" "X1800rep3condWT" "X1800rep4condWT"
## [91] "X1800rep5condWT" "X1800rep6condWT" "X1800rep7condWT"
## [94] "X14400rep1condWT" "X14400rep2condWT" "X14400rep3condWT"
## [97] "X14400rep4condWT" "X14400rep5condWT" "X14400rep6condWT"
## [100] "X14400rep7condWT"
We will now parse the data into an object of class QFeatures, we have provided
a function to assist with this in the package. If you want to do this yourself
use the readQFeatures function from the QFeatures package.
MBPqDF <- parseDeutData(object = DataFrame(MBP_wide),
design = new.colnames,
quantcol = 3:102)
Heatmap visualisations of HDX data
To help us get used to the QFeatures we show how to generate a heatmap
of these data from this object:
pheatmap(t(assay(MBPqDF)),
cluster_rows = FALSE,
cluster_cols = FALSE,
color = brewer.pal(n = 9, name = "BuPu"),
main = "Stuctural variant deuterium incoperation heatmap",
fontsize = 14,
legend_breaks = c(0, 2, 4, 6, 8, 10, 12, max(assay(MBPqDF))),
legend_labels = c("0", "2", "4", "6", "8","10", "12", "Incorporation"))
If you prefer to have the start-to-end residue numbers in the heatmap instead
you can change the plot as follows:
regions <- unique(MBP[,c("pep_start", "pep_end")])
xannot <- paste0("[", regions[,1], ",", regions[,2], "]")
pheatmap(t(assay(MBPqDF)),
cluster_rows = FALSE,
cluster_cols = FALSE,
color = brewer.pal(n = 9, name = "BuPu"),
main = "Stuctural variant deuterium incoperation heatmap",
fontsize = 14,
legend_breaks = c(0, 2, 4, 6, 8, 10, 12, max(assay(MBPqDF))),
legend_labels = c("0", "2", "4", "6", "8","10", "12", "Incorporation"),
labels_col = xannot)

Functional data analysis of HDX-MS data
The hdxstats package uses an empirical Bayes functional approach to analyse
the data. We explain this idea in steps so that we can get an idea of the approach.
First we fit the parametric model to the data. This will allow us to explore
the HdxStatModel class.
res <- differentialUptakeKinetics(object = MBPqDF[,1:100], #provide a QFeature object
feature = rownames(MBPqDF)[[1]][37], # which peptide to do we fit
start = list(a = NULL, b = 0.0001, d = NULL, p = 1)) # what are the starting parameter guesses
Here, we see the HdxStatModel class, and that a Functional Model was applied
to the data and a total of 7 models were fitted.
res
## Object of class "HdxStatModel"
## Method: Functional Model
## Fitted 7
The nullmodel and alternative slots of an instance of HdxStatModel provide
the underlying fitted models. The method and formula slots provide vital
information about what analysis was performed. The vis slot provides a ggplot
object so that we can visualise the functional fits.
res@vis

Since this is a ggplot object, we can customise in the usual grammatical ways.
res@vis + scale_color_manual(values = brewer.pal(n = 8, name = "Set2"))
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
A number of standard methods are available and can be applied to a HdxStatModels,
these extend the usual base stats methods. These include
anova: An analysis of variance
logLik: The log-likelihood of all the fitted models
residuals: The residuals for the fitted models
vcov: The variance-covariance matrix between parameters of the models
likRatio: The likelihood ratio between null and alternative models
wilk: Applies wilk’s theorem to obtain a p-value from the liklihood ratio
coef: The fitted model coefficients
deviance: The deviance of the fitted models
summary: The statistical summary of the models.
anova(res)
## Analysis of Variance Table
##
## Model 1: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
## Model 2: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
## Model 3: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
## Model 4: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
## Model 5: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
## Model 6: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
## Model 7: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
## Model 8: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
## Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
## 1 96 14.0906
## 2 8 0.0786 88 14.0120 16.2033 0.0001427 ***
## 3 8 0.0964 0 0.0000
## 4 8 0.0619 0 0.0000
## 5 8 0.0435 0 0.0000
## 6 8 0.0889 0 0.0000
## 7 8 0.1386 0 0.0000
## 8 24 0.1983 -16 -0.0597 0.2155 0.9955920
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
logLik(res)
## null alt1 alt2 alt3 alt4 alt5 alt6
## -43.910623 13.141385 11.918525 14.576956 16.697584 12.405724 9.739545
## alt7
## 29.570914
residuals(res)
## $null
## [1] -0.1087176969 -0.2047176969 -0.1207176969 -0.0461908281 -0.0721908281
## [6] -0.0101908281 -0.2657724078 -0.2317724078 -0.3037724078 -0.0460995039
## [11] -0.1310995039 -0.1080995039 -0.0707176969 -0.0977176969 -0.1837176969
## [16] -0.0121908281 0.0208091719 0.0208091719 -0.2757724078 -0.1597724078
## [21] -0.1947724078 0.0349004961 -0.0130995039 0.0619004961 -0.0157176969
## [26] -0.1057176969 -0.0147176969 0.0318091719 0.0178091719 0.0498091719
## [31] -0.1717724078 -0.1347724078 -0.1887724078 0.0049004961 0.0679004961
## [36] 0.0009004961 -0.0247176969 -0.0757176969 -0.1397176969 0.0018091719
## [41] 0.0338091719 0.0568091719 -0.0857724078 -0.1017724078 -0.0847724078
## [46] -0.0630995039 -0.0120995039 0.0369004961 -0.1137176969 -0.1017176969
## [51] -0.1047176969 -0.0371908281 -0.0691908281 -0.0251908281 -0.2237724078
## [56] -0.2737724078 -0.3757724078 -0.0840995039 0.0069004961 -0.0650995039
## [61] 0.3902823031 0.4472823031 0.4262823031 1.2018091719 1.2488091719
## [66] 1.1888091719 1.3082275922 1.2262275922 1.3102275922 0.6789004961
## [71] 0.6759004961 0.6619004961 -0.1687176969 -0.1247176969 -0.1197176969
## [76] -0.1237176969 -0.1107176969 -0.2357176969 -0.1797176969 -0.1191908281
## [81] -0.2011908281 -0.1491908281 -0.1421908281 -0.1591908281 -0.1241908281
## [86] -0.1971908281 -0.4717724078 -0.4527724078 -0.3827724078 -0.4057724078
## [91] -0.4807724078 -0.4207724078 -0.4237724078 -0.2140995039 -0.1950995039
## [96] -0.1260995039 -0.1060995039 -0.1350995039 -0.1900995039 -0.0950995039
## attr(,"label")
## [1] "Residuals"
##
## $alt1
## [1] -0.009322888 -0.105322888 -0.021322888 0.106288152 0.080288152
## [6] 0.142288152 -0.082503400 -0.048503400 -0.120503400 0.070940082
## [11] -0.014059918 0.008940082
## attr(,"label")
## [1] "Residuals"
##
## $alt2
## [1] -0.003592973 -0.030592973 -0.116592973 0.099392618 0.132392618
## [6] 0.132392618 -0.157218529 -0.041218529 -0.076218529 0.031475396
## [11] -0.016524604 0.058475396
## attr(,"label")
## [1] "Residuals"
##
## $alt3
## [1] -1.088794e-02 -1.008879e-01 -9.887937e-03 9.663941e-02 8.263941e-02
## [6] 1.146394e-01 -8.121607e-02 -4.421607e-02 -9.821607e-02 -1.656477e-05
## [11] 6.298344e-02 -4.016565e-03
## attr(,"label")
## [1] "Residuals"
##
## $alt4
## [1] 0.02237014 -0.02862986 -0.09262986 0.04539864 0.07739864 0.10039864
## [7] -0.05170530 -0.06770530 -0.05070530 -0.03619616 0.01480384 0.06380384
## attr(,"label")
## [1] "Residuals"
##
## $alt5
## [1] -0.055996734 -0.043996734 -0.046996734 0.122628468 0.090628468
## [6] 0.134628468 -0.018500046 -0.068500046 -0.170500046 -0.014309611
## [11] 0.076690389 0.004690389
## attr(,"label")
## [1] "Residuals"
##
## $alt6
## [1] -0.10759976 -0.05059976 -0.07159976 0.14770160 0.19470160 0.13470160
## [7] -0.08134445 -0.16334445 -0.07934445 0.03046397 0.02746397 0.01346397
## attr(,"label")
## [1] "Residuals"
##
## $alt7
## [1] -0.065672255 -0.021672255 -0.016672255 -0.020672255 -0.007672255
## [6] -0.132672255 -0.076672255 0.151605211 0.069605211 0.121605211
## [11] 0.128605211 0.111605211 0.146605211 0.073605211 -0.118700059
## [16] -0.099700059 -0.029700059 -0.052700059 -0.127700059 -0.067700059
## [21] -0.070700059 -0.039361618 -0.020361618 0.048638382 0.068638382
## [26] 0.039638382 -0.015361618 0.079638382
## attr(,"label")
## [1] "Residuals"
vcov(res)
## $nullvcov
## a b d p
## a 51845.96221 -40.30049781 -564.4474104 -55.77619164
## b -40.30050 0.03142629 0.4338129 0.04312009
## d -564.44741 0.43381288 6.3931296 0.61872995
## p -55.77619 0.04312009 0.6187300 0.06056114
##
## $altvcov
## $altvcov[[1]]
## a b d p
## a 10444961.9522 -703.03358414 -6015.6075822 -673.25122429
## b -703.0336 0.04732351 0.4042563 0.04528165
## d -6015.6076 0.40425632 3.5873568 0.39401021
## p -673.2512 0.04528165 0.3940102 0.04372867
##
## $altvcov[[2]]
## a b d p
## a 19354023.7673 -1.007942e+03 -8697.0872334 -991.86073521
## b -1007.9424 5.249590e-02 0.4522867 0.05162067
## d -8697.0872 4.522867e-01 4.0499729 0.45304650
## p -991.8607 5.162067e-02 0.4530465 0.05122726
##
## $altvcov[[3]]
## a b d p
## a 4552740.6012 -368.82485376 -3266.9390972 -383.37329769
## b -368.8249 0.02988215 0.2641473 0.03102956
## d -3266.9391 0.26414732 2.4323079 0.27978712
## p -383.3733 0.03102956 0.2797871 0.03254357
##
## $altvcov[[4]]
## a b d p
## a 6864.520437 -6.441906003 -106.11073025 -10.722778012
## b -6.441906 0.006083713 0.09794035 0.009982697
## d -106.110730 0.097940348 1.71150119 0.169125295
## p -10.722778 0.009982697 0.16912530 0.016917195
##
## $altvcov[[5]]
## a b d p
## a 13387360.6872 -647.49396247 -5681.5798697 -770.53586039
## b -647.4940 0.03131890 0.2743356 0.03723897
## d -5681.5799 0.27433561 2.5117403 0.33309240
## p -770.5359 0.03723897 0.3330924 0.04473613
##
## $altvcov[[6]]
## a b d p
## a 2.9829656 0.135178730 -2.15569792 -0.142999918
## b 0.1351787 0.006239611 -0.09913449 -0.006522984
## d -2.1556979 -0.099134493 1.59005367 0.103073198
## p -0.1429999 -0.006522984 0.10307320 0.006924275
##
## $altvcov[[7]]
## a b d p
## a 1889281.98027 -99.552456909 -927.27826196 -156.27693378
## b -99.55246 0.005246328 0.04874778 0.00822604
## d -927.27826 0.048747779 0.47799148 0.07837851
## p -156.27693 0.008226040 0.07837851 0.01305747
likRatio(res)
## logLR
## 303.9225
wilk(res)
## p-value
## 2.718741e-50
coef(res)
## a b d p
## null 32.051041 0.033332159 0.00000000 0.2091504
## alt1 117.106163 0.008312519 0.05395662 0.2067844
## alt2 132.034758 0.007210028 0.10762327 0.2097946
## alt3 103.291695 0.008900141 0.22258216 0.2124511
## alt4 27.362931 0.037597301 0.00000000 0.2150760
## alt5 123.828766 0.006296249 0.37946849 0.2253357
## alt6 8.320486 0.130807205 0.00000000 0.3096142
## alt7 103.413551 0.005789148 0.61994061 0.2476660
deviance(res)
## null alt1 alt2 alt3 alt4 alt5
## 14.09056723 0.07861433 0.09638630 0.06188589 0.04346058 0.08886908
## alt6 alt7
## 0.13859104 0.19831856
summary(res)
## $null
##
## Formula: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 32.05104 227.69708 0.141 0.888
## b 0.03333 0.17727 0.188 0.851
## d 0.00000 2.52846 0.000 1.000
## p 0.20915 0.24609 0.850 0.398
##
## Residual standard error: 0.3831 on 96 degrees of freedom
##
## Number of iterations to convergence: 53
## Achieved convergence tolerance: 1.49e-08
##
##
## $alt1
##
## Formula: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.171e+02 3.232e+03 0.036 0.972
## b 8.313e-03 2.175e-01 0.038 0.970
## d 5.396e-02 1.894e+00 0.028 0.978
## p 2.068e-01 2.091e-01 0.989 0.352
##
## Residual standard error: 0.09913 on 8 degrees of freedom
##
## Number of iterations till stop: 98
## Achieved convergence tolerance: 1.49e-08
## Reason stopped: Number of calls to `fcn' has reached or exceeded `maxfev' == 500.
##
##
## $alt2
##
## Formula: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.320e+02 4.399e+03 0.030 0.977
## b 7.210e-03 2.291e-01 0.031 0.976
## d 1.076e-01 2.012e+00 0.053 0.959
## p 2.098e-01 2.263e-01 0.927 0.381
##
## Residual standard error: 0.1098 on 8 degrees of freedom
##
## Number of iterations till stop: 98
## Achieved convergence tolerance: 1.49e-08
## Reason stopped: Number of calls to `fcn' has reached or exceeded `maxfev' == 500.
##
##
## $alt3
##
## Formula: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 103.2917 2133.7152 0.048 0.963
## b 0.0089 0.1729 0.051 0.960
## d 0.2226 1.5596 0.143 0.890
## p 0.2125 0.1804 1.178 0.273
##
## Residual standard error: 0.08795 on 8 degrees of freedom
##
## Number of iterations till stop: 97
## Achieved convergence tolerance: 1.49e-08
## Reason stopped: Number of calls to `fcn' has reached or exceeded `maxfev' == 500.
##
##
## $alt4
##
## Formula: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 27.3629 82.8524 0.330 0.750
## b 0.0376 0.0780 0.482 0.643
## d 0.0000 1.3082 0.000 1.000
## p 0.2151 0.1301 1.654 0.137
##
## Residual standard error: 0.07371 on 8 degrees of freedom
##
## Number of iterations to convergence: 58
## Achieved convergence tolerance: 1.49e-08
##
##
## $alt5
##
## Formula: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.238e+02 3.659e+03 0.034 0.974
## b 6.296e-03 1.770e-01 0.036 0.972
## d 3.795e-01 1.585e+00 0.239 0.817
## p 2.253e-01 2.115e-01 1.065 0.318
##
## Residual standard error: 0.1054 on 8 degrees of freedom
##
## Number of iterations till stop: 97
## Achieved convergence tolerance: 1.49e-08
## Reason stopped: Number of calls to `fcn' has reached or exceeded `maxfev' == 500.
##
##
## $alt6
##
## Formula: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 8.32049 1.72713 4.818 0.00133 **
## b 0.13081 0.07899 1.656 0.13632
## d 0.00000 1.26097 0.000 1.00000
## p 0.30961 0.08321 3.721 0.00586 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1316 on 8 degrees of freedom
##
## Number of iterations to convergence: 38
## Achieved convergence tolerance: 1.49e-08
##
##
## $alt7
##
## Formula: value ~ a * (1 - exp(-b * (timepoint)^p)) + d
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.034e+02 1.375e+03 0.075 0.9407
## b 5.789e-03 7.243e-02 0.080 0.9370
## d 6.199e-01 6.914e-01 0.897 0.3788
## p 2.477e-01 1.143e-01 2.167 0.0403 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0909 on 24 degrees of freedom
##
## Number of iterations till stop: 95
## Achieved convergence tolerance: 1.49e-08
## Reason stopped: Number of calls to `fcn' has reached or exceeded `maxfev' == 500.
Analysis of a typical HDX-MS experiment
We have seen the basic aspects of our functional modelling approach. We now
wish to roll out our method across all peptides in the experiment. The
fitUptakeKinetics function allows us to apply our modelling approach across
all the peptide in the experiment. We need to provide a QFeatures object
and the features for which we are fitting the model. The design will be extracted
from the column names or you can provide a design yourself. The parameter
initilisation should also be provided. Sometimes the model can’t be fit on the
kinetics. This is either because there is not enough data or through lack of
convergence. An error will be reported in these cases but this should not
perturb the user. You may wish to try a few starting values if there
excessive models that fail fitting.
res <- fitUptakeKinetics(object = MBPqDF[,c(1:24)],
feature = rownames(MBPqDF[,c(1:24)])[[1]],
start = list(a = NULL, b = 0.001, d = NULL, p = 1))
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
The code chunk above returns a class HdxStatModels indicating that a number
of models for peptide have been fit. This is simply a holder for a list
of HdxStatModel instances.
res
## Object of class "HdxStatModels"
## Number of models 104
We can easily examine indivual fits by going to the underyling HdxStatModel
class:
res@statmodels[[1]]@vis + scale_color_manual(values = brewer.pal(n = 2, name = "Set2"))
## Warning in brewer.pal(n = 2, name = "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
We now wish to apply statistical analysis to these fitted curves. Our approach
is an empirical Bayes testing procedure, which borrows information across peptides
to stablise variance estimates. Here, we need to provide the original data
that was analysed and the HdxStatModels class. The following code chunk
returns an object of class HdxStatRes. This object tell us that statistical
analysis was performed using our Functional model.
out <- processFunctional(object = MBPqDF[,1:24], params = res)
out
## Object of class "HdxStatRes"
## Analysed using Functional model
The main slot of interest is the results slot which returns quantities of
interest such as p-values and fdr corrected p-values because of multiple testing.
The following is the DataFrame of interest.
out@results
## DataFrame with 104 rows and 8 columns
## Fstat.Fstat Fstat.numerator Fstat.denomenator
## <list> <list> <list>
## VIWINGDKGYNG_2 0.70875 0.00109168 0.00154029
## VIWINGDKGYNGL_2 0.0329098 6.18014e-05 0.0018779
## GDKGYNGLAEVG_3 0.980103 0.00129713 0.00132346
## LAEVGKKFEKDTGIKVTVEHPDK_4 0.809885 0.00117025 0.00144496
## AEVGKKFEKDTGIKV_4 0.806449 0.00140588 0.00174329
## ... ... ... ...
## NAQKGEIMPNIPQM_2 1.74802 0.00425775 0.00243575
## NAQKGEIMPNIPQMSA_2 5.50307 0.013198 0.00239829
## NAQKGEIMPNIPQMSAF_2 7.06683 0.00431371 0.000610416
## PNIPQMSAF_1 0.905875 0.00277575 0.00306417
## SAFWYAVRTAVINAA_4 0.364667 0.00215098 0.00589847
## pvals fdr ebayes.pvals ebayes.fdr
## <numeric> <numeric> <numeric> <numeric>
## VIWINGDKGYNG_2 0.597687 0.999959 0.584744 0.999952
## VIWINGDKGYNGL_2 0.997692 0.999959 0.997487 0.999952
## GDKGYNGLAEVG_3 0.445918 0.999959 0.438269 0.999952
## LAEVGKKFEKDTGIKVTVEHPDK_4 0.536935 0.999959 0.525822 0.999952
## AEVGKKFEKDTGIKV_4 0.538920 0.999959 0.520828 0.999952
## ... ... ... ... ...
## NAQKGEIMPNIPQM_2 0.18879415 0.6333739 0.16773031 0.6229983
## NAQKGEIMPNIPQMSA_2 0.00554988 0.0443991 0.00430216 0.0394778
## NAQKGEIMPNIPQMSAF_2 0.00177921 0.0231297 0.00269021 0.0378613
## PNIPQMSAF_1 0.48384985 0.9999586 0.44999538 0.9999517
## SAFWYAVRTAVINAA_4 0.83017683 0.9999586 0.80620115 0.9999517
## fitcomplete
## <integer>
## VIWINGDKGYNG_2 1
## VIWINGDKGYNGL_2 2
## GDKGYNGLAEVG_3 3
## LAEVGKKFEKDTGIKVTVEHPDK_4 4
## AEVGKKFEKDTGIKV_4 5
## ... ...
## NAQKGEIMPNIPQM_2 100
## NAQKGEIMPNIPQMSA_2 101
## NAQKGEIMPNIPQMSAF_2 102
## PNIPQMSAF_1 103
## SAFWYAVRTAVINAA_4 104
We can now examine the peptides for which the false discovery rate is less
than 0.05
which(out@results$ebayes.fdr < 0.05)
## AADGGYAFKYENGKY_3 FKYENGKY_3 DIKDVGVDNAGAKAGL_2
## 42 45 47
## VGVDNAGAKAGLTF_2 VDLIKNKHMNA_4 VDLIKNKHMNADTD_3
## 50 55 56
## VDLIKNKHMNADTDY_4 IDTSKVNY_2 AKDPRIAATM_3
## 57 68 96
## ENAQKGEIMPNIPQMSAF_2 NAQKGEIMPNIPQMSA_2 NAQKGEIMPNIPQMSAF_2
## 99 101 102
Let us visualise some of these examples:
res@statmodels[[42]]@vis + res@statmodels[[45]]@vis
As we can see our model has picked up some subtle differences, we can further
visualise these using a forest plot. We can see the the functions are very similar
as the parameters are almost identical (a,b,p,d). However, we can see that
the deuterium differences are lower in 10% structural variant condition.
fp <- forestPlot(params = res@statmodels[[42]])
We can produce a table to actual numbers. We see that at all 4 timepoints
the deuterium difference is negative, though the confidence intervals overlap
with 0. Our functional approach is picking up this small but reproducible difference.
knitr::kable(fp$data)
It is also possible to visualize, these plots on a different scale. Of course,
changing the natural scaling will emphasis different parts of the plot and
could distort interpretation. In particular, if a log transform is used then
care should be taken when interpreting values around 0. We suggest examining
the numerical values in a forest plot or table alongside any transformation of
the variables. We suggest using the pseudo log transform as this allows
control the linearity of the plot, clearly demonstrating this a choice
of visualisation (and not of statistical modelling). The parameter sigma
below controls the scaling factor of the linear part of the transformation.
res@statmodels[[42]]@vis + scale_x_continuous(
trans = pseudo_log_trans(base = 10, sigma = 0.01), breaks = c(0, 10^(1:7)))

res@statmodels[[42]]@vis + scale_x_continuous(
trans = pseudo_log_trans(base = 10, sigma = 0.0001), breaks = c(0, 10^(1:7)))

res@statmodels[[42]]@vis + scale_x_continuous(
trans = pseudo_log_trans(base = 10, sigma = 10), breaks = c(0, 10^(1:7)))

Let’s us now have a look a situation where the changes are more dramatic.
res_wt <- fitUptakeKinetics(object = MBPqDF[, c(61:100)],
feature = rownames(MBPqDF[, c(61:100)])[[1]],
start = list(a = NULL, b = 0.001, d = NULL, p = 1))
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "Could not fit model, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## [1] "model fit failed, likely exessive missing values"
out_wt <- processFunctional(object = MBPqDF[, c(61:100)], params = res_wt)
We can visualise some of the result and generate plots.
res_wt@statmodels[[27]]@vis/res_wt@statmodels[[28]]@vis + plot_layout(guides = "collect")|(forestPlot(params = res_wt@statmodels[[27]], condition = c("WT", "W169G"))/forestPlot(params = res_wt@statmodels[[28]], condition = c("WT", "W169G")) + plot_layout(guides = "collect")) +
plot_annotation(tag_levels = 'a') + plot_layout(widths = c(1, 1))



An epitope mapping experiment
We now describe the analysis of an epitope mapping experiment. Here, the data
analysis is more challenging, since only 1 replicate in each condition, apo and
antibody, was performed. If we make some simplifying assumptions rigorous
statistical analysis can still be performed.
The experiment was performed on HOIP-RBR, we loaded the data below from inside
the package
HOIPpath <- system.file("extdata", "N64184_1a2_state.csv", package = "hdxstats")
HOIP <- read.csv(HOIPpath)
unique(HOIP$State)
## [1] "apo" "dAb13_1" "dAb13_2" "dAb25_1" "dAb25_2" "dAb27_1" "dAb27_2"
## [8] "dAb2_1" "dAb2_2" "dAb6_1" "dAb6_2"
HOIP$Exposure <- HOIP$Exposure * 60 #convert to seconds
filter(HOIP, Sequence == unique(HOIP$Sequence[1])) %>%
ggplot(aes(x = Exposure,
y = Center,
color = factor(State, unique(HOIP$State)))) +
theme_classic() + geom_point(size = 3) +
scale_color_manual(values = colorRampPalette(brewer.pal(8, name = "Set2"))(11)) +
labs(color = "experiment", x = "Deuterium Exposure", y = "Deuterium incoperation")

As before we need to convert data to an object of classes QFeatures
for ease of analysis.
First, we put the data into a DataFrame object. Currently, its in long format
so we switch to a wide format
HOIP_wide <- pivot_wider(data.frame(HOIP),
values_from = Center,
names_from = c("Exposure", "State"),
id_cols = c("Sequence"))
Now remove all columns with only NAs
HOIP_wide <- HOIP_wide[, colSums(is.na(HOIP_wide)) != nrow(HOIP_wide)]
The colanmes are not very informative, provide in the format X(time)rep(repliate)cond(condition)
colnames(HOIP_wide)[-c(1)]
## [1] "0_apo" "30_apo" "300_apo" "0_dAb13_1" "30_dAb13_1"
## [6] "300_dAb13_1" "0_dAb13_2" "30_dAb13_2" "300_dAb13_2" "0_dAb25_1"
## [11] "30_dAb25_1" "300_dAb25_1" "0_dAb25_2" "30_dAb25_2" "300_dAb25_2"
## [16] "0_dAb27_1" "30_dAb27_1" "300_dAb27_1" "0_dAb27_2" "30_dAb27_2"
## [21] "300_dAb27_2" "0_dAb2_1" "30_dAb2_1" "300_dAb2_1" "0_dAb2_2"
## [26] "30_dAb2_2" "300_dAb2_2" "0_dAb6_1" "30_dAb6_1" "300_dAb6_1"
## [31] "0_dAb6_2" "30_dAb6_2" "300_dAb6_2"
new.colnames <- gsub("0_", "0rep1", paste0("X", colnames(HOIP_wide)[-c(1)]))
new.colnames <- gsub("rep1", "rep1cond", new.colnames)
# remove annoying % signs
new.colnames <- gsub("%", "", new.colnames)
# remove space (NULL could get confusing later and WT is clear)
new.colnames <- gsub(" .*", "", new.colnames)
Now, we can provide rownames and convert the data to a QFeatures object:
qDF <- parseDeutData(object = DataFrame(HOIP_wide),
design = new.colnames,
quantcol = 2:34,
rownames = HOIP_wide$Sequence)
As before, we can produce a heatmap, we perform a simple normalisation for
ease of visualisation:
mat <- assay(qDF)
mat <- apply(mat, 2, function(x) x - assay(qDF)[,1])
pheatmap(t(mat),
cluster_rows = FALSE,
cluster_cols = FALSE,
color = brewer.pal(n = 9, name = "BuPu"),
main = "HOIP RBR heatmap",
fontsize = 14,
legend_breaks = c(0, 2, 4, 6,8,10,12, max(assay(qDF))),
legend_labels = c("0", "2", "4", "6", "8","10", "12", "Incorporation"))

Let us first perform a quick test:
res <- differentialUptakeKinetics(object = qDF[,1:33],
feature = rownames(qDF)[[1]][3],
start = list(a = NULL, b = 0.01, d = NULL),
formula = value ~ a * (1 - exp(-b*(timepoint))) + d)
res@vis+ scale_color_manual(values = colorRampPalette(brewer.pal(8, name = "Set2"))(11))
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

Whilst this analysis performs good fits for the functions, there are too many
degrees of freedom to perform sound statistical analysis. Hence, we normalize
to remove the degree of freedom for the intercept. For simplicity and to preserve
the original matrix, we reprocess the data. We then fit a simplified kinetic
model, where only the plateau is inferred.
cn <- new.colnames[c(1:3,10:12)]
HOIP_wide_nrm <- data.frame(HOIP_wide)
HOIP_wide_nrm[, c(2:4)] <- HOIP_wide_nrm[,c(2:4)] - HOIP_wide_nrm[,c(2)] # normalise by intercept
HOIP_wide_nrm[, c(11:13)] <- HOIP_wide_nrm[,c(11:13)] - HOIP_wide_nrm[,c(11)] # normalised by intercept
newqDF <- parseDeutData(object = DataFrame(HOIP_wide_nrm),
design = cn,
quantcol = c(2:4, 11:13), rownames = HOIP_wide$Sequence)
res_all <- fitUptakeKinetics(object = newqDF[,1:6],
feature = rownames(newqDF[,1:6])[[1]],
start = list(a = NULL),
formula = value ~ a * (1 - exp(-0.05*(timepoint))))
funresdAb25_1 <- processFunctional(object = newqDF[,1:6],
params = res_all)
We can have a look at the results:
funresdAb25_1@results
## DataFrame with 110 rows and 8 columns
## Fstat.Fstat Fstat.numerator Fstat.denomenator
## <list> <list> <list>
## GPGQECA 0.505943 0.0236133 0.0466718
## CAVCGWALPHNRM 0.819962 0.0111598 0.0136102
## CAVCGWALPHNRMQAL 3.18841 0.130744 0.0410059
## CAVCGWALPHNRMQALTSCE 1.51203 0.598316 0.395703
## AVCGWALPHNRM 0.138914 0.00401256 0.0288852
## ... ... ... ...
## ATERYLHVRPQPLAGEDPPAYQARL 0.882916 0.0634907 0.0719102
## ERYLHVRPQPLAGEDPPAYQ 0.841277 0.0671491 0.0798181
## ERYLHVRPQPLAGEDPPAYQARL 2.11026 0.172771 0.081872
## RYLHVRPQPLAGEDPPAYQARL 1.65969 0.120275 0.0724686
## LQKLTEEVPLGQSIPRRRK 4.48223 0.196504 0.0438406
## pvals fdr ebayes.pvals ebayes.fdr
## <numeric> <numeric> <numeric> <numeric>
## GPGQECA 0.516181 0.652643 0.479007 0.612683
## CAVCGWALPHNRM 0.416403 0.558589 0.406959 0.545920
## CAVCGWALPHNRMQAL 0.148709 0.320744 0.123017 0.300707
## CAVCGWALPHNRMQALTSCE 0.286210 0.499732 0.237829 0.441045
## AVCGWALPHNRM 0.728272 0.801099 0.708845 0.779730
## ... ... ... ... ...
## ATERYLHVRPQPLAGEDPPAYQARL 0.400605 0.557804 0.3563859 0.502596
## ERYLHVRPQPLAGEDPPAYQ 0.410930 0.558053 0.3658748 0.509446
## ERYLHVRPQPLAGEDPPAYQARL 0.219967 0.443454 0.1817334 0.386882
## RYLHVRPQPLAGEDPPAYQARL 0.267115 0.481684 0.2263725 0.436859
## LQKLTEEVPLGQSIPRRRK 0.101671 0.266280 0.0814249 0.246599
## fitcomplete
## <integer>
## GPGQECA 1
## CAVCGWALPHNRM 2
## CAVCGWALPHNRMQAL 3
## CAVCGWALPHNRMQALTSCE 4
## AVCGWALPHNRM 5
## ... ...
## ATERYLHVRPQPLAGEDPPAYQARL 106
## ERYLHVRPQPLAGEDPPAYQ 107
## ERYLHVRPQPLAGEDPPAYQARL 108
## RYLHVRPQPLAGEDPPAYQARL 109
## LQKLTEEVPLGQSIPRRRK 110
which(funresdAb25_1@results$ebayes.fdr < 0.05)
## IQLRESLEPDA RESLEPDAYALFHKKLTEGVL YALFHKKLTEGVL
## 36 42 43
## REQLEATCPQCHQTF EATCPQCHQTF MYLQENGIDCPKCKF
## 52 53 65
## YLQENGIDCPKCKFSYA LQENGIDCPKCKFSYA
## 68 70
We can plot these kinetics to see what is happening. This allows us to visualise
region of protection and deprotection, potentially identifiying the epitope.
(res_all@statmodels[[36]]@vis +
res_all@statmodels[[42]]@vis +
res_all@statmodels[[43]]@vis +
res_all@statmodels[[65]]@vis +
res_all@statmodels[[68]]@vis +
res_all@statmodels[[70]]@vis +
res_all@statmodels[[52]]@vis +
res_all@statmodels[[53]]@vis ) + plot_layout(guides = 'collect')
We can make a Manhatten plot to better specially visualise what’s happening.
#We need to provide an indication of "difference" so we can examine deprotected
# or prected regions
diffdata <- assay(newqDF)[,6] - assay(newqDF)[,3]
sigplots <- manhattenplot(params = funresdAb25_1,
sequences = HOIP$Sequence,
region = HOIP[, c("Start", "End")],
difference = diffdata,
nrow = 1)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
sigplots[[1]] + plot_layout(guides = 'collect')

We can visualise this in a peptide plot which helps us understand the nature
of the overlap
fpath <- system.file("extdata", "HOIP.txt", package = "hdxstats", mustWork = TRUE)
HOIPfasta <- readAAStringSet(filepath = fpath, "fasta")
scores <- funresdAb25_1@results$ebayes.fdr
out <- plotEpitopeMap(AAString = HOIPfasta[[1]],
peptideSeqs = unique(HOIP$Sequence),
numlines = 2,
maxmismatch = 1,
by = 1,
scores = 1 * (-log10(scores[unique(HOIP$Sequence)]) > -log10(0.05)) + 0.0001,
name = "significant")
## Warning in brewer.pal(n = 2, name = "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
out[[1]]/(out[[2]]) + plot_layout(guides = 'collect') & theme(legend.position = "right")

We can further visualise this a barcode of particular residues, here we use
residue level averaging to obtain results at the residue level.
scores <- funresdAb25_1@results$ebayes.fdr
out2 <- plotEpitopeMapResidue(AAString = HOIPfasta[[1]],
peptideSeqs = unique(HOIP$Sequence),
numlines = 2,
maxmismatch = 1,
by = 5,
scores = scores[unique(HOIP$Sequence)],
name = "-log10 p value")
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
out2[[1]]/out2[[2]] + plot_layout(guides = 'collect') & theme(legend.position = "right")
We can also plot multiple residue maps on the same plot so that we can compare
different antibodies.
scores <- funresdAb25_1@results$ebayes.fdr
avMap25_1 <- ComputeAverageMap(AAString = HOIPfasta[[1]],
peptideSeqs = unique(HOIP$Sequence),
numlines = 2, maxmismatch = 1,
by = 10, scores = scores[unique(HOIP$Sequence)],
name = "-log10 p value")
## generate results from other dAB
cn <- new.colnames[c(1:3,19:21)]
HOIP_wide_nrm <- data.frame(HOIP_wide)
HOIP_wide_nrm[,c(2:4)] <- HOIP_wide_nrm[,c(2:4)] - HOIP_wide_nrm[,c(2)]
HOIP_wide_nrm[,c(20:22)] <- HOIP_wide_nrm[,c(20:22)] - HOIP_wide_nrm[,c(20)]
newqDF2 <- parseDeutData(object = DataFrame(HOIP_wide_nrm),
design = cn,
quantcol = c(2:4,20:22),
rownames = HOIP_wide$Sequence)
res_all2 <- fitUptakeKinetics(object = newqDF2[,1:6],
feature = rownames(newqDF2[,1:6])[[1]],
start = list(a = NULL),
formula = value ~ a * (1 - exp(-0.07*(timepoint))))
## Warning in max(data$value): no non-missing arguments to max; returning -Inf
## Error in nlsLM(data = data, formula = formula, start = start, control = nls.lm.control(maxiter = 500, :
## parameters without starting value in 'data': value
## [1] "model fit failed, likely exessive missing values"
funresdAb27_2 <- processFunctional(object = newqDF[,1:6],
params = res_all2)
scores <- funresdAb27_2@results$ebayes.fdr
# compute average map
avMap27_2 <- ComputeAverageMap(AAString = HOIPfasta[[1]],
peptideSeqs = unique(HOIP$Sequence),
numlines = 2,
maxmismatch = 1,
by = 10,
scores = scores[unique(HOIP$Sequence)],
name = "-log10 p value")
# set rownames
rownames(avMap25_1) <- "dAb25_1"
rownames(avMap27_2) <- "dAb27_2"
# store in a list
avMap <- list(avMap27_2 = avMap27_2,
avMap25_1 = avMap25_1)
#plotting
out3 <- plotAverageMaps(avMap, by = 20)
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
out3[[1]]/out3[[2]] + plot_layout(guides = 'collect') & theme(legend.position = "right")
